NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dichotomous intronic polyadenylation profiles reveal multifaceted gene functions in the pan-cancer transcriptome

https://doi.org/10.1038/s12276-024-01289-w

Sun, Jiao; Kim, Jin-Young; Jun, Semo; Park, Meeyeon; de_Jong, Ebbing; Chang, Jae-Woong; Cheng, Sze; Fan, Deliang; Chen, Yue; Griffin, Timothy J; et al (October 2024, Experimental & Molecular Medicine)

Alternative cleavage and polyadenylation within introns (intronic APA) generate shorter mRNA isoforms; however, their physiological significance remains elusive. In this study, we developed a comprehensive workflow to analyze intronic APA profiles using the mammalian target of rapamycin (mTOR)-regulated transcriptome as a model system. Our investigation revealed two contrasting effects within the transcriptome in response to fluctuations in cellular mTOR activity: an increase in intronic APA for a subset of genes and a decrease for another subset of genes. The application of this workflow to RNA-seq data from The Cancer Genome Atlas demonstrated that this dichotomous intronic APA pattern is a consistent feature in transcriptomes across both normal tissues and various cancer types. Notably, our analyses of protein length changes resulting from intronic APA events revealed two distinct phenomena in proteome programming: a loss of functional domains due to significant changes in protein length or minimal alterations in C-terminal protein sequences within unstructured regions. Focusing on conserved intronic APA events across 10 different cancer types highlighted the prevalence of the latter cases in cancer transcriptomes, whereas the former cases were relatively enriched in normal tissue transcriptomes. These observations suggest potential, yet distinct, roles for intronic APA events during pathogenic processes and emphasize the abundance of protein isoforms with similar lengths in the cancer proteome. Furthermore, our investigation into the isoform-specific functions of JMJD6 intronic APA events supported the hypothesis that alterations in unstructured C-terminal protein regions lead to functional differences. Collectively, our findings underscore intronic APA events as a discrete molecular signature present in both normal tissues and cancer transcriptomes, highlighting the contribution of APA to the multifaceted functionality of the cancer proteome.
more » « less
Full Text Available
Dichotomous intronic polyadenylation profiles reveal multifaceted gene functions in the pan- cancer transcriptome

Sun, Jiao; Kim, Jin-Young; Jun, Semo; Park, Meeyeon; Jong, Ebbing de; Chang, Jae-Woong; Cheng, Sze; Fan, Deliang; Chen, Yue; Griffin, Timothy J; et al (October 2024, Experimental & Molecular Medicine)

Alternative cleavage and polyadenylation within introns (intronic APA) generate shorter mRNA isoforms; however, their physiological significance remains elusive. In this study, we developed a comprehensive workflow to analyze intronic APA profiles using the mammalian target of rapamycin (mTOR)-regulated transcriptome as a model system. Our investigation revealed two contrasting effects within the transcriptome in response to fluctuations in cellular mTOR activity: an increase in intronic APA for a subset of genes and a decrease for another subset of genes. The application of this workflow to RNA-seq data from The Cancer Genome Atlas demonstrated that this dichotomous intronic APA pattern is a consistent feature in transcriptomes across both normal tissues and various cancer types. Notably, our analyses of protein length changes resulting from intronic APA events revealed two distinct phenomena in proteome programming: a loss of functional domains due to significant changes in protein length or minimal alterations in C- terminal protein sequences within unstructured regions. Focusing on conserved intronic APA events across 10 different cancer types highlighted the prevalence of the latter cases in cancer transcriptomes, whereas the former cases were relatively enriched in normal tissue transcriptomes. These observations suggest potential, yet distinct, roles for intronic APA events during pathogenic processes and emphasize the abundance of protein isoforms with similar lengths in the cancer proteome. Furthermore, our investigation into the isoform-specific functions of JMJD6 intronic APA events supported the hypothesis that alterations in unstructured C-terminal protein regions lead to functional differences. Collectively, our findings underscore intronic APA events as a discrete molecular signature present in both normal tissues and cancer transcriptomes, highlighting the contribution of APA to the multifaceted functionality of the cancer proteome.
more » « less
Full Text Available
Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read Alignment

https://doi.org/10.1109/JETCAS.2023.3241545

Zhang, Fan; Angizi, Shaahin; Sun, Jiao; Zhang, Wei; Fan, Deliang (March 2023, IEEE Journal on Emerging and Selected Topics in Circuits and Systems)

Full Text Available
“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters

https://doi.org/10.18653/v1/2023.findings-emnlp.243

Wan, Yixin; Pu, George; Sun, Jiao; Garimella, Aparna; Chang, Kai-Wei; Peng, Nanyun (January 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)

Full Text Available
HAGEN: Homophily-Aware Graph Convolutional Recurrent Network for Crime Forecasting

https://doi.org/10.1609/aaai.v36i4.20338

Wang, Chenyu; Lin, Zongyu; Yang, Xiaochen; Sun, Jiao; Yue, Mingxuan; Shahabi, Cyrus (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

The goal of the crime forecasting problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions constructed a distance-based region graph and utilized Graph Neural Network (GNN) techniques for crime forecasting, because the GNN techniques could effectively exploit the latent relationships between neighboring region nodes in the graph if the edges reveal high dependency or correlation. However, this distance-based pre-defined graph can not fully capture crime correlation between regions that are far from each other but share similar crime patterns. Hence, to make a more accurate crime prediction, the main challenge is to learn a better graph that reveals the dependencies between regions in crime occurrences and meanwhile captures the temporal patterns from historical crime records. To address these challenges, we propose an end-to-end graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN (i.e., graph convolution works better where neighboring nodes share the same label), we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns, thus fitting the mechanism of diffusion convolution. Empirical experiments and comprehensive analysis on two real-world datasets showcase the effectiveness of HAGEN.
more » « less
Full Text Available
Multi-omics data integration by generative adversarial network

https://doi.org/10.1093/bioinformatics/btab608

Ahmed, Khandakar Tanvir; Sun, Jiao; Cheng, Sze; Yong, Jeongsik; Zhang, Wei (August 2021, Bioinformatics)
Robinson, Peter (Ed.)
Abstract Motivation Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. Results Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. Availability and implementation Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Computational Methods to Study Human Transcript Variants in COVID-19 Infected Lung Cancer Cells

https://doi.org/10.3390/ijms22189684

Sun, Jiao; Fahmi, Naima Ahmed; Nassereddeen, Heba; Cheng, Sze; Martinez, Irene; Fan, Deliang; Yong, Jeongsik; Zhang, Wei (September 2021, International Journal of Molecular Sciences)
null (Ed.)
Microbes and viruses are known to alter host transcriptomes by means of infection. In light of recent challenges posed by the COVID-19 pandemic, a deeper understanding of the disease at the transcriptome level is needed. However, research about transcriptome reprogramming by post-transcriptional regulation is very limited. In this study, computational methods developed by our lab were applied to RNA-seq data to detect transcript variants (i.e., alternative splicing (AS) and alternative polyadenylation (APA) events). The RNA-seq data were obtained from a publicly available source, and they consist of mock-treated and SARS-CoV-2 infected (COVID-19) lung alveolar (A549) cells. Data analysis results show that more AS events are found in SARS-CoV-2 infected cells than in mock-treated cells, whereas fewer APA events are detected in SARS-CoV-2 infected cells. A combination of conventional differential gene expression analysis and transcript variants analysis revealed that most of the genes with transcript variants are not differentially expressed. This indicates that no strong correlation exists between differential gene expression and the AS/APA events in the mock-treated or SARS-CoV-2 infected samples. These genes with transcript variants can be applied as another layer of molecular signatures for COVID-19 studies. In addition, the transcript variants are enriched in important biological pathways that were not detected in the studies that only focused on differential gene expression analysis. Therefore, the pathways may lead to new molecular mechanisms of SARS-CoV-2 pathogenesis.
more » « less
Full Text Available
LncRNAPAINT is associated with aggressive prostate cancer and dysregulation of cancer hallmark genes

https://doi.org/10.1002/ijc.33569

Hasan, Md Faqrul; Ganapathy, Kavya; Sun, Jiao; Khatib, Ayman; Andl, Thomas; Soulakova, Julia N.; Coppola, Domenico; Zhang, Wei; Chakrabarti, Ratna (August 2021, International Journal of Cancer)
null (Ed.)
Full Text Available
AS-Quant: Detection and Visualization of Alternative Splicing Events with RNA-seq Data

https://doi.org/10.3390/ijms22094468

Fahmi, Naima Ahmed; Nassereddeen, Heba; Chang, Jaewoong; Park, Meeyeon; Yeh, Hsinsung; Sun, Jiao; Fan, Deliang; Yong, Jeongsik; Zhang, Wei (May 2021, International Journal of Molecular Sciences)

(1) Background: A simplistic understanding of the central dogma falls short in correlating the number of genes in the genome to the number of proteins in the proteome. Post-transcriptional alternative splicing contributes to the complexity of the proteome and is critical in understanding gene expression. mRNA-sequencing (RNA-seq) has been widely used to study the transcriptome and provides opportunity to detect alternative splicing events among different biological conditions. Despite the popularity of studying transcriptome variants with RNA-seq, few efficient and user-friendly bioinformatics tools have been developed for the genome-wide detection and visualization of alternative splicing events. (2) Results: We propose AS-Quant, (Alternative Splicing Quantitation), a robust program to identify alternative splicing events from RNA-seq data. We then extended AS-Quant to visualize the splicing events with short-read coverage plots along with complete gene annotation. The tool works in three major steps: (i) calculate the read coverage of the potential spliced exons and the corresponding gene; (ii) categorize the events into five different categories according to the annotation, and assess the significance of the events between two biological conditions; (iii) generate the short reads coverage plot for user specified splicing events. Our extensive experiments on simulated and real datasets demonstrate that AS-Quant outperforms the other three widely used baselines, SUPPA2, rMATS, and diffSplice for detecting alternative splicing events. Moreover, the significant alternative splicing events identified by AS-Quant between two biological contexts were validated by RT-PCR experiment. (3) Availability: AS-Quant is implemented in Python 3.0. Source code and a comprehensive user’s manual are freely available online.
more » « less
Full Text Available
PIM-Aligner: A Processing-in-MRAM Platform for Biological Sequence Alignment

https://doi.org/10.23919/DATE48585.2020.9116303

Angizi, Shaahin; Sun, Jiao; Zhang, Wei; Fan, Deliang (March 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available

« Prev Next »

Search for: All records